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ABSTRACT 


Interactive programming environments for languages offer many advantages 
over traditional batch-oriented ones, such as immediate static analysis. One form 
of analysis is type checking, yet type checking in this setting for languages with 
common features like overloading has received little attention. 

We implement an interactive type checker for the polymorphic type system 
of ML with overloading. The implementation was produced automatically from an 
attribute grammar using the Synthesizer Generator, an attribute evaluator generator. 
Type inference then is accomplished via attribute evaluation so that if the evaluation 


is done incrementally, then type inference becomes incremental as well. 
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I. INTRODUCTION 


In this thesis, we assume the reader is familiar with basic type theory and its 
associated notational conventions. We also assume a general familiarity with the 
concepts and notation of the lambda-calculus. A comprehensive presentation of 
these concepts can be found in the texts of Thompson [Tho91] and Gunter [Gun92]. 

The advantages of interactive programming environments to increase program- 
mer effectiveness and maximize utilization of system resources are significant. For 
example, during program development, extensive context-sensitive type checking is a 
valuable tool. The immediate recognition of type errors at this stage could yield vast 
improvements to the quality and reliability of today’s software products. Valuable 
system resources would be preserved through decreased waste due to unnecessary 
re-compilations. Perhaps more significantly, the advantages of providing an environ- 
ment where programmers can focus on the fundamental aspects of a problem with 
a much higher degree of continuity are clear. 

The study of type inference is integral to this effort. Though significant ad- 
vances have been made in this research area, further work needs to be done. This 
thesis considers a suitable type system for implementing a polymorphic program- 
ming language with overloading. Utilizing this type system, an implementation is 
produced that performs incremental type inference in an interactive environment. 

One can argue that system ML represents the current state of the art in type 
systems. It is a polymorphic type system but prohibits the use of overloading. Yet 
the need for overloading in programming languages is well known. Current imper- 


ative languages, such as Ada and C++, and even the functional language standard 


ML , allow an identifier to represent different types but the resulting programs merely 





contain monomorphic instances overloaded on an identifiers name. A process called 
overloading resolution is required to assign a particular type to an identifier based 


on its context. Consider the following expressions, where + is defined over integers 


and reals: 
(a) 1+2 
(b) 1.0+2.0 


What is the type of +? We only know that it can have the type int — int — int 
or the type real — real — real. But we can reliably assign neither of these types to 
+ without first examining its context. In a polymorphic language,.like ML , we can 
assign + the type Va.a—> a-— a but this results in + having too many types. On 
the other hand, if we assign + the type real — real — real we preclude its use in 
expression (a). We will examine these issues in more detail in Chapter II. 

What is needed is a means to express a type for + which encompasses all of 
its possible types and no more. We can do this with the use of constrained type 
schemes. We can then assign to any occurrence of +, regardless of its context, the 
type Va with(+ :a—7a—>a).a—>a-—da. This means that + can assume any 
finite type a ~ a@ — a, with a instantiated to any particular type for which + is 
defined. 

Using the concept of constrained type schemes, an extension to system ML has 
been developed incorporating overloading called ML,. The associated type inference 
algorithm W, infers principal types for expressions in ML,. It turns out that, unless 
we place restrictions on the kinds of overloadings we can express using constrained 
type schemes, typability in ML,is undecidable. In Chapter IV we consider a form of 
overloading called parametric overloading which makes typability in ML, decidable 
and present an algorithm which determines satisfiability of constraints with respect 


to a parametric assumption set. 


A. IMPLEMENTING W, 


W, performs batch type inference. In this respect, it is unsuitable for direct 
incorporation into a useful interactive programming environment. What is needed 
is an incremental approach to type inference which will provide immediate feedback 
to the programmer when type errors are encountered. 

One might attempt to rewrite W, to achieve incremental type inference. Our 
approach is to utilize the formalism of attribute grammars to express W,. In this 
setting type inference is performed via attribute evaluation. As expressions are input 
a corresponding change is reflected in the attribution. If we are able to perform 
attribute evaluation incrementally, type inference can also be done incrementally. 
Furthermore, it is implicit in the formalism. 

We present an implementation of W, utilizing an attribute grammar in SSL, the 
language of the Synthesizer Generator of Grammatech. It is an attribute evaluator 
generator that takes as input a set of attribute equations and returns as output 
an attribute evaluator, or in our case, a type-checker. By utilizing the Synthesizer 
Generator for our implementation we are not only able to produce an attribute 
evaluator, but one in which attribute evaluation is done incrementally. As a result, 
we are able to achieve both attribute evaluation and type inference in an incremental 


setting. Chapters IV and V discuss details of the implementation and the algorithms 


used. 





Il. TYPE SYSTEMS 


The concept of type systems in programming languages deals with a set of 
rules which, when applied to terms of a language, produce types for those terms. 
The notion of types in programming languages has been given steadily increasing 
importance over the past several years. It is clear that languages with rich type 
classes offer programmers more flexibility in modeling real-world objects. Yet, there 
remains a significant lack of consensus as to what types are. As consensus in this area 
is critical to the successful application of type theory to practical implementations 
of new programming environments, this chapter outlines the most important aspects 


of type systems and their application to this thesis. 
A. WHAT IS A TYPE? 


When discussing types, there exists a tendency to confuse the distinction be- 
tween implementation issues and the underlying nature of types in general. Actual 
machines, for example, provide relatively few types (i.e. integers, floating-point num- 
bers, pointers, etc... ). The implementation of types in a high-level language, while 
posing some very real problems in the area of compiler design, should remain dis- 
tinct from a discussion of type correctness in the higher context of the meaning of 
types. With reference to implementation issues, referred to as Reductionist type 


correctness, Smith states: 
The key issue is how to protect the representation from misuse. [Smi91] 


In this thesis, we will not concern ourselves with the reductionist view of types. 


Rather, we will view a type as an algebra, a set of values and operations such that 


the set is closed under these operations. For example, type int is the set of integers 
together with the usual arithmetic operations, but the set of natural numbers and 
the predecessor operation do not form a type. This view gives us a fundamental basis 
from which to discuss the meaning and usage of types in programming languages 
unencumbered by implementation issues. Operations of an algebra are axiomatized, 
providing then a semantics that one can use to reason about programs in which they 
occur. In order to use the axioms, however, it may be necessary to restrict the types 
of certain program arguments to the algebras in question. For example, if we are to 
prove that a function adds | to its argument then we might wish to fix the type of 
its argument to int, say. For some programs, though, reasoning can proceed without 


fixing argument types. Such programs are called polymorphic. 
B. POLYMORPHISM 


Polymorphic means to have many forms. With respect to programming lan- 
guages, this refers to programs or terms which have many types, or can operate on 
values of many types. Perhaps more intuitively, we can state that the purpose of 
polymorphism is to allow programs which use a single name to operate on many 
different types of inputs and, perhaps, produce different types of output. 

We will first be concerned with a form of polymorphism called parametric poly- 


morphism, where polymorphic entities can be described by a universally quantified 


formula with all quantification at the outermost level (e.g. Va.a — a). In Figure 2.1, 


we give an example of a function, /ength, defined in a generic polymorphic program- 
ming language. We can ascribe to length type Va. list(a) — int. It’s meaning is a 
function which given a list computes its length. 

Languages which do not support polymorphism put unnecessary restrictions on 
the use of a function. Consider the Pascal program in Figure 2.2. Procedure min 


has the type: int + int — int. Yet there is nothing inherent in min which depends 





function length(x) 


{ 
if not null(x) then 
1 + length(tail(x)) 
else 
0 
} 


Figure 2.1: Polymorphic length function 


on integer. Replacing integer with char would yield a correct Pascal program with 
meaning corresponding to the lexicographic ordering of characters. 

It is not uncommon for the claim to be made that Ada is a polymorphic pro- 
gramming language, as in [ASU86]. One might argue that it is, but really only weakly 
so. Through the use of generics, one can define a template for representing what ap- 
pears to be a polymorphic function. In the example of Figure 2.3, one might wish to 
ascribe the type Va.a — a@ — a to the Ada function min within the generic package 
MIN_PKG. This would indicate that min is defined over all instantiations of a, 
including int and char. This is obviously not the case, for a generic package cannot 
be used directly in Ada. It must first be instantiated for a particular type so that it 
can be properly type checked. Though the language provides constructs for express- 
ing polymorphism, the resulting compiled program merely contains monomorphic 
instances of the function overloaded on the identifier min. Research into providing 


polymorphism in an imperative language is ongoing [Car87]. 
procedure min(x,y : integer); 


begin 
if x < y then 
return (x) 
else 
return(y) 
end 


Figure 2.2: Pascal min function 


generic 
type ITEM is private; 
with function "<"(left,right : ITEM) 
return BOOLEAN is <>; 
package MIN_PKG is 
function min(x,y : ITEM) return ITEM is 
begin 
if x < y then 
return (x) 
else 
return(y) 
end min; 
end MIN_PKG; 


Figure 2.3: Ada generic mzn function 


It is clear that parametric polymorphism is a desirable property of practical 


programming languages. Yet, in practice, situations arise where parametric polymor- 


phism alone cannot provide us with the means to express certain types adequately. 


Consider a polymorphic type for mzn in Figure 2.2. Clearly it is meaningful for 
multiple types. However, if we ascribe the type Va.a > a — ato min, terms with- 
out meaning, such as min(true, false), become typable. It can be seen that min 
depends on “<” being defined over its parameters. What is needed is the ability to 
restrict use of mzn to input types whose values are partially ordered. In other words, 
we need to be able to overload “<” so that min is polymorphic yet bounded in the 


types of arguments to which it can be applied, a form of bounded polymorphism. 
C. OVERLOADING 


The common view of overloading is stated as follows: 


An overloaded symbol is one that has different meanings depending on 


its context [ASU86] (emphasis added). 





This process of determining the meaning of an expression by examining its context 
is called overloading resolution. This is, in fact, the usual way of treating overloaded 
symbols in a program; demanding that the local context of an overloaded symbol 
determine a particular overloading to be used at each occurrence. This kind of 
treatment is used even in the polymorphic language ML. In fact, any overloading 
that requzres overloading resolution to determine its meaning is termed an incoherent 


overloading and gives rise to potential semantic ambiguity. For example 


(2.1) { * : real — real — real, \ 


* :Va.matriz(a) > matriz(a) > matriz(a) 
is an incoherent overloading of the operator + where * stands for real multiplication 
and matrix multiplication. 

Consider a term Az.Ay.z * y. We can infer two different types for it: real 
real — real and Va. matriz(a) — matriz(a) — matriz(a). We must apply the 
process of overloading resolution to determine the meaning of the term. 

A more desirable form of overloading, called coherent overloading, arises when 
an overloading is constructed in such a way that its various instances share a com- 
mon semantics. In this case, overloading resolution is not required to ascribe a 
unique meaning to terms. It’s meaning is uniquely determined from an inspection 
of the axioms for the operators occurring in a term. For example, suppose * is 
commutative. We can readily see that our overloading in (2.1) is incoherent. For, 
although we can derive from (2.1) that Az.\y.2z *y has type real — real — real and 
Va. matriz(a) — matriz(a) — matriz(a), matrix multiplication is not commuta- 
tive. If we replace our second assumption on * with * : int — int — int with the 
meaning of integer multiplication, the overloading now becomes coherent for both 
integer and real multiplication are commutative. So we know that regardless of the 


types of x and y, the function is commutative. 


As will be seen, while our implementation of W, does not prohibit the introduc- 
tion of incoherent overloadings, our assumption is that all overloadings are coherent. 
If this assumption is invalid with respect to a particular overloading, types will still 
be correctly inferred for expressions involving that overloading. However, the guar- 
antee that the meaning of such an expression is uniquely determined is lost. 

Surprisingly, it is common in current languages to introduce incoherent over- 
loadings regardless of the potential for semantic ambiguities. In Ada, for example, 
the operator “/” is overloaded with different meanings of integer and floating-point 
division. 

Overloadings allowed in most languages, including: Ada, C++ and standard 
ML, are restricted to being finite. In the ML, type system this restriction is 
lifted. For example, we can represent an infinite overloading over lists under equality 


as: Va with =:a—a-— bool. list(a) — list(a) — bool. In this case, if = has an 


instance at tT + 7 — bool, then it also has an instance at list(r) — list(r) — bool. 





Ill. THE ML TYPE SYSTEM AND 
OVERLOADING 


In this chapter we consider an extensicn of a Curry-style typed lambda cal- 
culus (Ag) with type schemes called System ML. As mentioned previously, a type 
scheme represents parametric polymorphism, implying that all quantification must 
be outermost, or shallow. Research aimed at removing this restriction is described 
in [Lei83, McC84, KT90]. 

A free identifier may be denoted as having infinitely many types via a type 
scheme. For instance, the primitive LISP operation hd may be given the type: 
Va.seq(a) — a@ which would indicate that for any choice of a, say rT, hd has the 
type: seq(T) — Tr. 

System ML preserves the property of principal types; every typable term has a 
principal type, one that is more general than any other type derivable for the term. 
For instance, the term Af.\z.fz, f and z occurring free, would have as principal 
type Va.VB.(a— B) — (a > B). This is regarded as the most general typing for 
this expression. This means that any type whatsoever of \f.Az. fx can be derived 
from the type Va.VG.(a + 8B) — (a — £B) by suitably instantiating a and 2; 
formally, we say that all the types of \f.Azr.fx are instances of the principal type. 
The existence of principal types means that a type inference algorithm will always 
compute a unique “best” type for a program. 

In order to retain principal types, lambda abstraction in System ML , as in Ao, 
is monomorphic. This means that lambda-bound identifiers within a A-expression 
cannot be assigned multiple types. Consider the expression (Az.z(Ay.y))Az.z. This 


expression is typable in System ML with principal type Va.a > a. This conforms to 
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the restriction on lambda abstraction since z, while being able to assume infinitely 
many values, has polymorphic type Va.a — a. The restriction is manifest when 
an attempt at self-application is made within a \-expression. For example, a term 
such as (Ay. yy)Az. x is illegal in System ML . Here, y must be able to assume two 
different types; (@ — a) and q@ for some particular a. This results in the term Ay. yy 
having type Va.(V8.8 — 8) — a, which is not a principal type. 

In order to allow free identifiers denoting polymorphic values to be assigned 
multiple types, one uses the /et construct. The above expression can then be rep- 
resented as let y = \z.z in yy. This involves no inner quantification, since each 
instance of y is replaced with Az. z in determining the type for yy. 

System ML , like Ao, has a decidable typability problem. In other words, if a 
type exists for a program (there may be more than one), the type inference algorithm 
will be able to infer a correct type for it. Conversely, if a type does not exist, the algo- 
rithm is capable of making that determination. System ML is also widely accepted 
and has been incorporated into mainstream languages like Standard ML [HMM86] 
and Miranda [Tur86]. Yet, an obvious and practical limitation exists in System ML 
that prohibits overloading by restricting the number of assumptions per identifier 
in a type assumption set to at most one. Milner himself makes the comment in his 
1978 paper [Mil78] that allowing more than one assumption is desirable. 

An extension to the ML type system has been developed called MZ,[VoS91]. It 
retains principal types and allows overloading. Deviations from System ML include 
the introduction of constrained type schemes and modifications to the type instanti- 
ation and generalization rules. Many extensions of System ML have been proposed 
to incorporate overloading. Among these are the systems of [Kae88, CDO91, Smi91, 
Kae92, Jon92] and those related to the development of the functional programming 


language Haskell [WaB89, CHO92, NiP93]. All of these type systems share the no- 
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tion of a constrained type scheme in various forms. A critique of these type systems 


is given in [Vol93b]. 
A. MLo 


Given a set of type variables (a, G,7,...) and a set of type constructors (int, 
real, bool, list,...) of various arities, the set of unquantified types is defined by: 
tTisalrt—3arlx(t,.--,m) 
The set of quantified types or type schemes, then, is defined by 
Gu VOigcesy ti) with (21 1 Ty i say hin f Tei Te 
where a@,...,@, is the set of quantified variables of 0, 21: T1,...,2m: Tm is the set 
of constraints on o, and 7 is the body of o. If there are no quantified variables, the 
“VY” may be omitted. If there are no constraints, the “with” may be omitted. In 
our terminology, o will always be reserved to represent a type scheme, @ denotes an 
abbreviation for a,,...,@, and C will be used to represent a list of constraints. The 
most general form of a type scheme is then: 
o:=VG@ with C.r 
A substitution is a set of replacements for type variables applied simultaneously 
to all type variables. For example: 
[Big gesu 5 Oy, = Ripoads Ml 
is a substitution where all of the a;’s are distinct. The substitution is applied to a 
type T by simultaneously replacing all of the a;’s with the corresponding 7,’s. The 
application of substitution S to type 7 will be denoted by 7 S. 
Two new type assignment rules, (V-intro) and (V-elim), are given in Figure 3.1; 
these represent extensions to System ML developed to accommodate overloading. It 
should also be noted that if the constraint list C is empty, these two extensions are 


identical to type generalization and instantiation in system ML [Mil78, DaM82]. 
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(hypoth) Akez:o,ifr:c0€A 


(—-intro) A, W{esrh rE Mer 
Alt \rz.M:1r—7' 





(—-elim) AFM:r—77', AFN:r 
AF(MN):1' 





(let) AFM:o, A,U{z:a}F N:t 
AFletrc=Min N:r 





(V-intro) AUCEM:7', AF Cla:=7] 
AFM :V&@ with C.7' 





(@ not free in A) 


(V-elim) A-M:Va with C.7’, AF Cla:=7] 
AFM :71'[@:=7] 





Figure 3.1: System ML, 


Consider a term M = Xz. Ay. ((z * z) = y) which contains free identifiers * and 
=, and the following assumption set. 


: real — real — real, 

: int > int > int, 

: int — int — bool, 

:Va with * :a—>+a—>a,=:a—>a-— bool. list(a) > list(a) — bool 


Here we show a derivation of 
Ak Xw.Ay.((2 * 2) = y) : Va with *:a > a> a,=:a—>a > bool.a +a = bool 


in ML,. 


QQ) AU{*:a+a-a}U{=: a4 a bool} U{z:a}Uf{y:a}b ria (hypoth) 
(2) AU{*:at+asa}U{=:a +a bol}U{z:a}U{y:a}Fe:asasa (hypoth) 
(3) AU{*:a+-a-sa}U{s:a—+a— bool} U{z:a}U{y:a}b (#2): asa (—-elim) 
(4) AU{*:a+asa}U{=:a+a— bool} U{z:a}U{y:a}b (cz): (—-elim) 
(5) AU{*:a—+a—-+a}U{=:a+a— bool} U{z:a}Uf{y:a}Fy:a (hypoth) 
(6) AU{*:a+asa}U{=:a +a bool} U{z:a}U{y: a} k=: a + a bool (hypoth) 
(7) AU{#:a-+-asa}U{=:a+a— bool} U{z:a}U{y:a} b= (z*z):a— bool (hypoth) 
(8) AU{*:a+a—-+a}U{=:a +a— bool} U{z:a}U{y: a} (z*z) =y: bool (hypoth) 
(9) AU{*:a+asa}U {=:a +a bool} U{z: a} Ay.((z*z) = y) sa > bool (—-intro) 
(10) AU{*:a—-+a—-+a}U {=:a + a — bool} b Az.Ay.((z *z) = y) 2a + a > bool (—-intro) 
(11) Ak {*:a—-a—-a}U {=:a +a — bool}[a := int] (hypoth) 
(12) AF Az.Ay.((c*z) = y): Vo with *:a +a —+a,=:a—+a— bool.a—+a-— bool (V-intro) 


We are required to introduce assumptions about * and = in our derivation in 


order to arrive at a type for M. However, for our derivation to succeed, we need 
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to be able to discharge those assumptions via the V-intro rule. This ensures that 
M can be derived from only our initial assumption set A. The V-intro rule deviates 
from generalization in system ML in that it requires that the constraint set C’ be 
derivable from the initial assumption set A. This ensures that C is satisfiable with 
respect to A. In our derivation, we can see from (11) that satisfiability is achieved 
by substituting int for a. In general, there can be more than one finite type which 
satisfies this requirement. For example, if = were defined for reals, both int and 
real could be used in our substitution for a. Conversely, it is not always possible to 
achieve satisfiability. For instance, if we removed the second assumption on * from 
A, our derivation would end at (10). There would be no single substitution for a 
which could satisfy the overlapping constraint requirements in (11) and we would 
conclude that M is untypable with respect to A. 

This requirement for satisfiability of constraint sets ensures that the type system 
ML, is sound. It is interesting to compare ML, to a similar extension to system ML 
proposed by Kaes [Kae88] based on type kinds, where a type kind is a universe of 
types over which a type variable may be quantified. It proposed a restricted form 
of overloading which is generally the same restriction adopted by ML,. However, 
this type system turns out to be unsound in that it does not enforce satisfiability of 
constraint sets as outlined above. This results in terms with multiple non-overlapping 
constraints being deemed typable in some instances. In the last example of the 
previous paragraph, for instance, the term M would be deemed typable. On the 
other hand, the similar work of [CDO91], in an effort to relax the restrictions on 
overloading in Kaes type system, enforces satisfiability and hence remains sound. 

We have shown, by example, the process required to determine the typability 
of a term in MLI,. This process can be described as a modification to the concept, 
used in system ML , of strong type inference [Tiu90]. Formally, strong type inference 


says that a term M is typable with respect to an assumption set Bif AF M: oa is 
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derivable for some type o and B C A. This criterion turns out to be less restrictive 
than required in the presence of overloading. We are free, under strong type inference, 
to choose any assumption set A which contains B. Returning to our derivation, it 
can be seen that, in step (11), we would have the freedom to introduce any new 
assumptions we required in order to satisfy typability under strong type inference, 
resulting in untypable terms being deemed typable. Strong type inference relies on 
the premise that assumption sets may contain at most one assumption per identifier. 
This premise, of course, does not hold in ML,. We then can view typability in ML, 
as being that of strong type inference with the requirement that B = A. In other 


words, B+ M : o must be derivable for some type a. 
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IV. TYPE INFERENCE IN SYSTEM ML, 


An algorithm, based on W of system ML , has been developed for ML, named 
W, [Smi91]. In this chapter, we will discuss type inference utilizing W,, which is 
given in Figure 4.1. 

W, infers principal types for typable expressions in ML, , failing on untypable 
expressions. Given assumption set A and expression e, W,(A,e) returns (5, B,7). 
S is a substitution such that AS U Bt e: 7 is derivable. B represents a set of 
constraints on A, which describe dependencies associated with overloaded identifiers 
occurring in e, needed to arrive at a type for e. W,, unlike W, utilizes the least 
common generalization (LCG) of an identifier overloaded in A. This concept, along 
with the function close(A, B,r) and unify(r,7’), we will examine in some detail in 
this chapter. 

The LCG of an overloaded identifier can, perhaps, be best described by begin- 
ning with an example. Consider the identifier *, overloaded in A with the assump- 
tions * : int — int — int, * : real > real — real and * : int — real — real. We 
can see that all of these assumptions have in common second and third arguments 
which are identical. There is no common ground in their structure with respect to 
their first arguments. We can describe their common structure by the use of two 
quantified type variables, one for the first argument and another for the remaining 
two. We would then assign as the LCG of *, Va, B.a— B — B. 

More formally, we can say that a common generalization of some set of finite 
types T1,---,7 is T if we can apply some set of substitutions S),...,5, such that 
Vi.rS; = 7;. We further say that 7 is a least common generalization if, for any other 


generalization r’ of rT, there exists a substitution S such that r’S = r. We can 
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W.(A ,e ) is defined by cases: 


eisz 
if x is overloaded in A with LCG Va.r, 
return ([ ], {z : 7S},7S) 
where S = [a := (] and f are new 
else if (x : Va with C.r) € A, 
return ([{],CS,7S) 
where S = [@ := A] and @ are new 
else fail. 


eis Ar.M 
let (S,B,r) = W.(Az U {x : a}, M) where a is new 
return (S,B,aS — 7). 


eis MN 
let (S, B,r) = W,(A, M) 
let (S’, B’, r') = W,(AS, N) 
let S” = unify(rS’,7' > a).where a is new 


return (9.S’S", BS'S" U B'S", aS"). 


eislet c=M in N 
let (S,B,r) = W.(A, M) 
let (B’,o) = close(AS, B,r) 
let (S’, BY, 7’) = W,(AzS U {2 : 0}, N) 
return (S.S’, B'S’ U B",7’). 


Figure 4.1: Algorithm W, 





extend this principle to constrained type schemes by applying the concept over the 
bodies of each constrained type scheme. Least common generalizations are discussed 
in [Rey70], which gives an algorithm for computing them. 

Function unify of W, performs first-order unification of terms in expressions. 
In essence, unify(t’,7”) returns a substitution S such that 7’ S = 7” S, and fails if 
no such substitution exists. Formal discussions of unification are given by Knight 
and Robinson in [Rob65, Kni89]}. 

Function close of W, takes as input (A, B,r) and returns a constrained type 
scheme for 7. This is accomplished, essentially, by applying the (V-intro) rule of 
ML, to r. Function satisfy within close checks for satisfiability of B with respect to 
A. The issue of satisfiability turns out to be one of the more interesting problems in 
the ML, type system. We will discuss this problem, therefore, in detail later in this 
chapter. Actually, there is latitude in how one computes the closure of a type in 
W,. A basic algorithm for close is given by Smith [Smi91] which is sufficient in sup- 
porting his soundness and completeness proofs of W,, but leaves the critical issue 
of satisfiability somewhat unresolved. Our implementation of W, uses an algorithm 
developed by Volpano which incrementally determines satisfiability as an expression 
is being constructed [Vol93a]. This approach allows us to detect certain type er- 
rors, with respect to constraints, earlier than the alternative approach of delaying 
satisfiability checks until the complete expression has been type checked. 

We reproduce Volpano’s algorithm for close(A, B,r) here for the sake of com- 


pleteness: 


1. Let V be the set of all finite types in B. For any two types 7 and 7; in V, 
define an undirected edge (71, 72) if types 7, and 72 share a type variable, and 
let E be the set of all such edges. 


2. Let B’ be the set of all constraints z : r’ in B for which there is no type 7” 
such that 7” contains a variable free in A and there is a path from 7’ to 7” in 


(V, E). 
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3. If B’ is unsatisfiable under A then fail. 


4. Let C be the set of all constraints x: r’ in B for which there is a type r” such 
that 7 and 7” share a type variable and there is a path from 7’ to r” in (V, E). 


5. Return (B — B’,Va with C.r), where & are the type variables free in C or + 
but not A. 


In steps (1) and (2) we define a graph which connects constraints in B which 
share a type variable, and extract types from B which do not overlap on a type 
variable. Set B’ then contains all of the constraints in B which can be eliminated, 
provided they are collectively satisfiable with respect to A. If we assume, as we 
do in our implementation, that the initial assumption set cannot contain free type 
variables, then in the final call to close we are guaranteed that all constraints in B 
will be discharged. This approach allows us to perform satisfiability checks in an 
incremental manner. We do not eliminate a constraint from B if it requires us to 
instantiate a type variable to some finite type; a subsequent term in the expression 
may require instantiation of that type variable, in which case we need to be able to 
ensure that previous overloading dependencies are satisfied. Consider the example, 
slightly modified, from [Vol93a]. If we have assumption set: 

: bool 


: int + int > int, 
: int — real — real, 
:Va.a— a — bool 


i++ > 


recognizing that + has LCG Va.a — a — a, say we have the partial expression 
Az. let y = Az.pair(z+2,z+2) in <ezp>. 


where <ezp> represents a placeholder. W,, in the process of computing a type for 


y, makes the call, 


close(AU {z: a}, B, y> (7x )), 
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where B is the constraint set, 
B={+:7>7-7, +:y7 aa}. 


Function close determines that B’ is empty, since all constraints in B share a type 


variable, y. So B is determined satisfiable, and close leaves B intact in returning 
(B, Vy with B.y— (yx a)). 


Now, suppose we replace <erp> with the term z = b. This determines the type of 


x to be bool. W, now makes its final call to close for the entire \-expression as 
close(A, B, bool — bool) 

where 
B={+:y>7777, +:7—7 bool > bool}. 


In our final call to close, since our initial assumption set contains no free type vari- 
ables, step (2) of the algorithm discharges all assumptions from B. This final call, 
then, fails since the second constraint on + is unsatisfiable. In the previous call to 
close, if we had discharged the constraints on + by including them in B’, satisfiability 
would be decided by instantiation of y to int and @ to real. As a result, the final 


call to close would succeed, causing an untypable expression to be deemed typable. 


A. PARAMETRIC OVERLOADING AND 
SATISFIABILITY 


Typability in ML, is Turing reducible to the problem of deciding whether a set 
of constraints is satisfiable with respect to a given set of type assumptions. Through 
the use of constrained type schemes, we can be very expressive in representing over- 


loadings. It turns out that, unless we restrict our representations to certain kinds of 
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/: int — int — real 

/ : real —+ real — real 

+: int — int — int 

+: real — real — real. 

+:Vawith +:a-a-a. 
list(a) — list(a) — list(a) 

avg :Va with +:a>7>a-a, /:a7>a- real. 
list(a) — real 

avg: Va with +:a7a5a,/:a>a- real. 
set(a@) — real 


Figure 4.2: Infinite and recursive overloadings 


overloadings, the problem of constraint-set satisfiability, and therefore typability, in 
ML, is undecidable [Smi91]. 

Consider the assumption set in Figure 4.2. We can see that the assumptions 
on avg and + contain infinite overloadings, e.g., + can assume a finite type, say 
list(list(...(list(int)))). Note also the occurrences of recursive overloadings, where 
the satisfiability of the constraint set depends on the assumption itself. A mutually 
recursive overloading would result if we added a constraint involving avg to the third 
assumption on +. 

Constraint-set satisfiability remains undecidable in the presence of mutual re- 
cursion and/or straight recursion without restrictions [Vol94a]. We should therefore 
explore suitable bounds on recursion which make our satisfiability problem decidable. 
We can see that recursion is a natural occurrence in practice through our example 
in Figure 4.2. For this reason, while it makes constraint-set satisfiability decidable, 
forbidding recursion entirely is unacceptable. 

Various approaches have been examined. Smith gives a restriction called over- 
loading by constructors which makes constraint-set satisfiability decidable in polyno- 
mial time [Smi91]. But it disallows constraints on an overloaded identifier z involving 
y where z # y. This would prohibit the overloading on avg in Figure 4.2. Another 
restriction, similar to that proposed by Kaes [Kae88] and adopted by Haskell (Has89], 
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is called parametric overloading. 
Parametric overloading is a more practical form of overloading which allows 
naturally recursive overloading like that of Figure 4.2 and makes constraint-set sat- 


isfiability decidable. This is the form that we adopt in this thesis. 
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Parametric overloading makes use of the concept of the least common gener- 
alization of finite types discussed earlier. We give a formal definition here from 
[Vol94b]. 

Definition A..1 Parametric assumption sets are defined inductively. 

The empty set is parametric. 


If A is parametric with no assumption for z and a is a constrained type scheme 
Ya with C.7 such that for each z : p € C, z is overloaded in A and p is a generic 
instance of its LCG then AU {z: o} is parametric. 


If A is parametric with no assumption tor z and B is the set 


z: Vy, with Cy. tla := xi(j)] 


z: VF, with Cy. tla := Xn(Fn)| 
such that 
e x has LCG Va.7r, 
© xy; # x; for: #7 (where y’s are type constructors of various arities), and 


e z: p € C; implies that z has LCG Vr. p, for some m € 7, and either z is 
overloaded in A or z = z, 


then AU B is parametric. 


Note that we can only specify constraints which involve an overloaded iden- 
tifier; constraints involving finite types or even polymorphic types are not allowed 
under our definition. Though there are instances where this limits the practical use 
of parametric overloading, this restriction is generally not a limiting factor in prac- 
tice. Smith has considered approaches to relaxing this particular restriction for type 
checking a language with subtyping and overloading [Smi91, Smi93]. This thesis, 
however, considers overloading only. We can also make the observation that an iden- 
tifier z parametrically overloaded in A can always be characterized by an LCG which 
has only one quantified variable. This gives us a practical view of the restrictions we 


are talking about. 
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Xo: int, real, bool 
“u=] X,: lst, ref 
X2: map, pair 


Figure 4.3: Type constructors of various arities 


We can characterize parametric overloadings as a regular forest of trees [Vol94b]. 
These regular forests can be generated by a class of context-free grammars called 
regular tree grammars [GeS84]. If A is parametric then every overloaded identifier 
z in A has an LCG of the form Va.r and the set of finite types 7 to which a can be 


instantiated, meaning At x: r[a:= 7] is derivable, form a regular tree language or 


forest. 


B. SATISFIABILITY ALGORITHM 


The determination of constraint-set satisfiability, which is computed by the 
function satisfiable(A,C), takes the assumption set A and the constraint set C as 
inputs. For any parametric assumption set A, we can construct for every overloaded 
identifier x a regular tree grammar G, such that if c has LCG Va.r then for any 
variable-free finite type 7’, we can derive A + xz: tla := 7’ if an only if 7’ € 
L(G,), where L(G,) represents the regular tree language generated by G,. In this 
context, we need only parse r’ with respect to L(G,) to determine whether constraint 
xz:t[a:=7'] is satisfiable with respect to A. 

An algorithm for satisfiability has been developed based on the property that 
regular forests are effectively closed under intersection [Vol94b]. Our implementation 
of W, uses this algorithm. Consider an example using the parametric assumption set 
of Figure 4.2 and the type constructors in Figure 4.3, which includes constructors of 


arity-0,1 and 2. 
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We can see that /, + and avg are overloaded in A with respective LCG’s: 
Va.a—>a— real, Va.a— a — aand Va.a — real. Our first task is to construct 
a dependency graph of assumptions in A; if an assumption on z contains a constraint 
on y we need to produce the grammar of y before we produce z’s grammar. We then 
can proceed to create regular tree grammars for each overloaded identifier in A based 
on dependencies. We see that avg depends on / and + in A so we must compute 
the grammar for avg last. 

Since identifiers may be overloaded recursively, as in our example, we will rep- 
resent occurrences of an identifier z in its own constraint list with the start symbol 
for G,. In the case of constrained type schemes with multiple constraints, as occurs 
in avg, we will represent this as a new non-terminal. This non-terminal will define 
new productions for the grammar which result from the computed intersection of the 
constraints. Given a constraint set which contains a constraint on z and a constraint 
on y there intersection is computed as L(G,)M L(G,). 

We represent the type constructors in & as a grammar Gy. We can then take 
advantage of the fact that L(G) M L(G,) = L(G,) for any overloaded identifier xz as 
we construct our grammars for A. We therefore obtain the following set of grammars 


for our example assumption set A: 


Gy: S= _ int real | bool | list(S) | 
ref(S) S—+S | pair(S,S) 
G;: A= int | real 
Gi: B= mt real | list(B) 
Gavg : C= list(D) set(D) 
D= int real 





where the non-terminal D represents L(G ;) M L(G). 
In our batch implementation of W,, where we do not allow occurrences of free 


type variables in the initial assumption set, we can create the set of regular tree 


25 


grammars once and reuse the representation. This was the approach we adopted in 
our implementation. 

We can now determine satisfiability of a constraint set C with relation to an 
assumption set B by parsing each constraint in C, of the form zd : 7, with respect 
to the grammars computed for B i.e. if r parses with respect to L(Giq) for each 
constraint in C then C is satisfiable. It is possible, though, that we may encounter 
overlapping constraints in C. In this case we must first compute the intersections on 
any overlapping constraints before parsing those that don’t overlap. If the computed 
intersection is empty then C is unsatisfiable. An intersection is empty if there exists 
no common type constructor of arity-0 between constraints. For example, grammar 


G below represents an empty intersection. 
G = list(G) | ref(G) 


This algorithm is exponential in the number of forests input, but this is very 
likely the best we can do for the problem has been shown NP-complete [Vol94b]. The 
use of our implementation of W, should provide valuable insight into determining 


whether the NP lower bound for constraint set satisfiability is a practical limitation. 
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V. IMPLEMENTATION OF W, 


As we have shown, algorithm W, has been developed to infer the most general 
type of a term given suitable forms of overloading. We envision an interactive pro- 
gramming environment in which incomplete expressions are type checked (may have 
placeholder terms) and can be subsequently updated, perhaps requiring new types 
to be inferred. 

In this setting, W, is unsuitable because it is not incremental. If a function, say 
f, is computed on input z, then on input change A, we say that the computation of 
f(x+A) is incremental if f(z +A) is computed from only f(z) and A. Although our 
implementation does not type check definitions, it nonetheless exhibits incremental 
type re-computation at the expression level, as we will show. 

In efforts to develop an incremental approach to type inference, we might at- 
tempt to re-write W,. We have, however chosen an approach which makes use of a for- 
malism, namely attribute grammars, for achieving incremental type re-computation. 
Utilizing this formalism we foresee our implementation not only providing a means 
to validate and explore bounds on the problem of type inference in the presence of 
overloading, but also as a step towards integrating incremental algorithms for on-line 


type inference and those for overloading. 


A. THE ROLE OF ATTRIBUTE GRAMMARS 


Updating expressions affords an opportunity to re-use previous type compu- 
tation. The attribute grammar formalism provides a framework in which type re- 


computation is identified with attribute re-evaluation. So if attribute re-evaluation is 
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done incrementally then type re-computation is incremental as well and furthermore 
it is implicit in the formalism. 

Using an attribute grammar we can specify the syntax of a language via a 
context-free grammar. Nodes of parse trees are annotated with attributes that are 
prescribed by a set of attribute equations given as part of the attribute grammar. Ifa 
parse tree is edited then attributes of the tree are re-computed using the equations so 
that a consistent attribution is maintained. Re-computing the attributes is implicit 
and is done by the attribute evaluator. 

The productions of the context-free grammar for type inference in ML, which 
we have developed for our implementation are given in Figure 5.1. Non-terminals are 
represented in upper case while terminals are in lower case. Terminals in productions 
that begin with Null represent placeholder terms which have universal type Va.a. 

Attributes are distinguished as either synthesized or inherited. Synthesized at- 
tributes occur on the left-hand side of attribute equations; inherited attributes occur 
on the right-hand side. In other words, in one case attributes are propagated up 
(synthesized) in the parse tree and in the other they are propagated down (inher- 
ited) in the parse tree. Figure 5.2 shows the inherited (AI) and synthesized (AS) 
attributes associated with the productions of Figure 5.1. 

To implement W,, we define attribute equations, which create dependencies 
between attribute values. As the derivation tree is updated these dependencies de- 
termine what part of the tree is affected and where selective re-computation, via the 
attribute equations, needs to be done in order to re-establish consistent attribute 
values throughout the tree. The set of attribute equations in Figure 5.3 then defines 
the dependencies required in each attributed production from Figure 5.1 to imple- 
ment W,. Functional support, indicated by italics, is simplified and represented by 
descriptive function names. The attributes S and B of EXP are precisely those terms 


returned by W, as discussed in Chapter IV. 
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TOPLEVEL 
TOPLEVEL 


ASSUMPTIONSET 
ASSUMPTIONSET 


ASSUMPTIONLIST 
ASSUMPTION 


ID 
9) ID 





12) TYPESCHEME 
13) TYPESCHEME 


(14) TYPEVARLIST, 
15) TYPEVARLIST 


16) QUANTTYPEVAR 
17) QUANTTYPEVAR 


CONSTRAINTLIST 


CONSTRAINT 
CONSTRAINT 


18) 

19) 

20) 

21) 

22) TYPEEXP 
23) TYPEEXP 
24) TYPEEXP 
25) TYPEEXP 
26) TYPEEXP 
27) TYPEEXP 
28) TYPEEXP; 
29) TYPEEXP; 
30) TYPEEXP, 
(31) TYPEEXP, 
(32) TYPEEXP, 


(33) EXPLIST, 
(34) EXPLIST 


(35) EXP 
(36) EXP, 
(37) EXP, 
(38) EXP; 
(39) EXP 





— ASSUMPTIONSET EXPLIST 
— NullPrgrm 


— ASSUMPTIONLIST 
— NullAssumptions 


ASSUMPTIONLIST,; — ASSUMPTION ASSUMPTIONLIST, 


— NullAssumption 
+ ID TYPESCHEMELIST 
= id 


— Nullld 


10) TYPESCHEMELIST,; — TYPESCHEME TYPESCHEMELIST, 
(11) TYPESCHEMELIST — NullTypeSchemeList 


— TYPEVARLIST CONSTRAINTLIST TYPEEXP 
— NullTypeScheme 


— QUANTTYPEVAR TYPEVARLIST, 
— NullTypeVarList 


— TypeVar 
— NullTypeVar 


CONSTRAINTLIST; — CONSTRAINT CONSTRAINTLIST2 


— NullConstraint List 


—+ ID TYPEEXP 
— NullConstraint 


— UniversalType 

— Int 

— Real 

— Bool 

— TypeVar 

— NullType 

— Map(TYPEEXP, TYPEEXP3) 
— Pair(TYPEEXP, TYPEEXP3) 
— List(TYPEEXP 2) 

— Ref(TYPEEXP2) 


— EXP EXPLIST2 
— NullExpression 


— ID 

— EXP2 EXP3 

—  ID.EXP2 

— let ID = EXP? in EXP3 
— NullExp 


Figure 5.1; Context-free grammar for ML, type inference 
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AITOPLEVEL = {} ASTOPLEVEL = {} 


AlassuMPTIONSET = {} AS 4sSsUMPTIONSET = {typeEnv} 
AlassuMPTIONLIST = {} AS aSsUMPTIONLIST = {typeEnv} 
AlassuMPTION = {} AS assuMPTION = {typeEnv} 
Alrp = {} ASip = {name} 


AlgexpListT = {typeEnv,typeGrammar} ASgexpzisr = {} 
Algxp = {typeEnv,typeGrammar} ASzxp = {S,B,typeAssignment} 


Figure 5.2: Inherited and synthesized attributes of implementation grammar 


To illustrate how incremental type recomputation is achieved via incremental 
attribute evaluation, consider Figure 5.4. Here we have a partial derivation tree 
annotated with a dependence graph showing the propagation of attributes in the 
tree. For simplicity, we have chosen one inherited attribute and one synthesized 
attribute. The inherited attribute A represents an assumption set. The synthesized 
attribute T is a constrained type scheme representing the type of an expression 
at each node of the tree. Figure 5.4 represents the partial derivation tree for the 
expression pr(z,\y.Az. y z), where pr is of type Va, B.a > 8 — pair(a, {). 
Suppose the expression rooted at node n3 is updated. We can see that T at node nz 
now must be recomputed but notice that no change has been made to the expression 
rooted at node n4, which therefore need not be retypechecked. In practice, this can 


result in significant savings as the tree whose root is nq can be arbitrarily large. 
1. The Synthesizer Generator Platform 


An attribute evaluator generator takes as input a set of attribute equations, 
such as those in Figure 5.3, for a set of terms and outputs an attribute evaluator that 
takes a term and annotates it with an attribution as prescribed by the equations. 
There are attribute evaluator generators available today that not only output an 
attribute evaluator but output one that evaluates attributes incrementally. One 


such generator is GrammaTech’s Synthesizer Generator (SynGen). 
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EXPLIST.typeEnv = InitialEnv() @ ASSUMPTIONSET .typeEnv 
EXPLIST.typeGrammar = ComputeGrammar{ ASSUMPTIONSET.typeEnv) 


ASSUMPTIONSET .typeEnv = ASSUMPTIONLIST.typeEnv 
ASSUMPTIONSET.typeEnv = NullTypeEnv 
ASSUMPTIONLIST, .typeEnv = ConcatEnv((ASSUMPTION name, ASSUMPTION type), 


ASSUMPTIONLIST»2.typeEnv) 
ASSUMPTIONLIST.typeEnv = NullTypeEnv 


ASSUMPTION.name = ID.name 
ASSUMPTION .type = TYPESCHEMELIST 


ID.name = id 





ID.name = ” undeclared” 


33) EXP.typeEnv = EXPLIST,.typeEnv 
EXP.typeGrammar = EXPLIST,.typeGrammar 
EXPLIST .typeEnv = EXPLIST,.typeEnv 
EXPLIST2.typeGrammar = EXPLIST).typeGrammar 


(35) EXP.typeAssignment = Compute Type(ID.name, EXP.typeEnv) 
(36) EXP,.S = let V = (Unify(EXP3.S EXP2.typeAssignment), 
(EXP 3.typeAssignment — NewVar{beta))) in 
V (EXP3.S EXP2.S) 
EXP,.typeAssignment = V beta 
EXP,.B = (V (EXP3.S EXP2.B)) @ 
(V EXP3.B) 
EXP 2.typeEnv = EXP,.typeEnv 
EXP3.typeEnv = EXP 2.S EXP,.typeEnv 
EXP ).typeGrammar = EXP,.typeGrammar 
EXP3.typeGrammar = EXP,.typeGrammar 


(37) EXP,.typeAssignment = (EXP2.S NewVar{beta)) + EXP2.typeAssignment 
EXP,.S = EXP2.S 
EXP,.B = EXP2.B 
EXP .typeEnv = ConcatEnw(ID.name, beta), EXP.typeEnv) 
EXP ..typeGrammar = EXP).typeGrammar 


(38) let (B’, 7) = Close( (EXP2.S EXP,.typeEnv), EXP2.B, 
EXP 2.typeAssignment, EXP, .typeGrammar) 
EXP .typeAssignment = EXP3.typeAssignment 
EXP,.S = (EXP3.S EXP2.S) 
EXP,.B = (EXP3.S B’) @ EXP3.B 
EXP ).typeEnv = EXP .typeEnv 
EXP3.typeEnv = ConcatEnx(1D.name, o), (EXP2.S EXP,.typeEnv)) 
EXP,2.typeGrammar = EXP,.typeGrammar 
EXP3.typeGrammar = EXP .typeGrammar 


(39) EXP.typeAssignment = New Var{beta) 
EXP.S = NullSubst 
EXP.B = NullConstraintList 


Figure 5.3: Attribute equations for MZ, type inference 





n, EXP inh A syn T 


nz APP inh A syn T 


LOW 


n3 APP inh A syn T ng EXP inh A syn T 


pr x (Ay Az. yz) 


Figure 5.4: A partial derivation tree and dependence graph for pr(z,\y.\z.y z) 
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We have developed our implementation utilizing SynGen for several reasons. 
We are able to get a comprehensive and visually appealing X-windows interface with 
relative ease. Utilizing Syngen also fits nicely with our opinion of attribute grammars 
as being a desirable approach to achieving incremental type inference. Furthermore, 
since we profit by the incremental algorithms embedded in SynGen, new advances 
in this area, which may well be incorporated in future versions, will directly enhance 
our implementation. 

The incremental algorithms used in SynGen rely heavily on the concept of 
ordered attribute grammars which were introduced in [Kas80]. The ordered attribute 
grammars are a subclass of the noncircular attribute grammars. Though SynGen 
can accept attribute grammars which are not ordered, it prohibits circular attribute 
grammars. 

The language of SynGen is SSL. Every (useful) SSL specification has three 
major declaration areas: Abstract syntax which defines a set of grammar rules, At- 
tribution which annotates the grammar with attributes and describes their depen- 


dencies, and Unparsing which defines display formats for terms, identifies selectable 


productions of the grammar and annotates which productions are editable. For our 


implementation, Figure 5.1 represents the Abstract syntax and Figures 5.2 and 5.3 


represent the Attribution. 
2. The Implementation 


We demonstrate our implementation through an annotated sequence of 
actual X-windows display screens generated by our type checker. Figure 5.5 shows 
an initial screen with placeholders for an assumption set entry, where we define 
extensions to an Initial Environment, and an expression. The currently selected term, 
corresponding to the ASSUMPTIONSET production of our grammar, is underlined. 


Note that the type inferred for the placeholder term <exp> is <universal type>. 
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Ri] -cntitlec~ aaa 


File Edit View Options Structure Text Help 


ASSUMPTIONS: 











EXPRESSIONS: 






<exp> 
TYPE: <universal type> 


Context: AssumptionSet 


Figure 5.5: Implementation initial screen. 
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TM sssumpthes:s! 5 ie 
File fdit View Options Structure Text Help 
Wrote &mp_mnutn/geminiMwork/bullithesis/ssVassumpthesis1.s 





i ASSUMPTIONS: 


fexpon: V a with (mult:(o —> (a — @))). (a — (int > «)); 

Heq?: V awith (eq?:(c — (cc — bool))) . (a — (a — bool)); 

{ : Vawith (). (list(a) — (list(a) — bool)); 

j : (int > (int — bool)); 

pmult: V a with (mult:(a— (a—«))).(a—(a—a)); 

: : (int > (int > int)); 
: (real — (real — real)); 
: Vawith (mult:(a— (a — «)), €q?:(a — (a — bool))) . (list(a) — (List(a) — list(«))); 
:<type> 


EXPRESSIONS: 


| <exp> 
| TYPE: <universal type> 


Context: TVPEX®? [Alpha] Beta] Ch [Det] Epalion] [int [real] oot [map [is] sea] [pa] 








Figure 5.6: An assumption set defined. 


We have entered an assumption set in Figure 5.6 with three overloaded 
identifiers. The first type scheme for each identifier, without the constraints, must 
represent the LCG of that identifier. The implementation currently does not com- 
pute the LCG and so it must be provided by the user. Note the terms enclosed in 
boxes at the bottom of the screen. These are called transforms. With the placeholder 
for TYPEEXP selected, we may select a transform with the mouse and replace the 
selected placeholder term with a term associated with the transform. This provides 
an alternative means to enter terms without the need for the user to remember the 
appropriate syntax. Users may also enter terms directly as long as the term being 


edited is defined in the unparsing rules. 
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In Figure 5.7 we have entered three expressions whose types have been 
inferred. The type of our first expression is represented by a constrained type scheme; 
it is the most general type we can give to it and we can be no more specific without 
more information. In the second expression, where r : real is defined in the initial 
environment, we see that, since mult is defined over reals, applying ezpon tor satisfies 
the constraint on expon and we are able to infer a finite type for the expression. An 
unsatisfiable constraint has been encountered in our finial expcenslen., This is a result 
of the multiple constraints on mult and eq? in the third assumption of mult. We can 


see that the grammar for mult is: 


Grut: S = int | real | list(U) 
U= int | list(U) 


which clearly does not derive list(real). 

It is also possible to directly examine the attributes of the parse tree at 
any point in the execution. This functionality, though mainly useful for debugging, 
can provide a means to investigate aspects of the implementation from a lower level 
viewpoint. For example, one might wish to examine a representation of the regular 
tree grammars produced for overloaded identifiers in a given assumption set. This 
can be done by examining the attribute typeGrammar at any EXP node of the 
parse tree. For instance, Figure 5.8 shows the regular tree grammars computed for 
the assumption set of Figure 5.7. Note that we have chosen to represent the start 
symbol of grammar G;q as id, for each overloaded identifier id in the assumption 
set. In addition, idj_*2dz was chosen to represent L(Gjq,) M L( Gia, ). 

We have given a brief overview, through examples, of the X-windows inter- 
face and general functionality of our implementation of W, with parametric over- 
loading. By examining instances of type inference in ML,, in the setting of our 
implementation, we have endeavored to provide the reader with a clearer under- 


standing of concepts discussed more formally in previous chapters. 
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assumptnesis.s 
File Edit View Options Structure Text 


Read Amp_mntin/geminiAvork/bullthesis/ssl/assumpthesls.s 





ASSUMPTIONS: 


expon: V awith (mult:(a —(a— a))).(a— (int > «)); 
eq?: V a with (eq?:(a — (a — bool))) . (a (a > bool)); 
: Vawith (). (list(a) — (list(a) — bool)); 
: (int > (int — bool)); 
mult: V awith (mult(a—(a— a))). (a9 (a> a)); 
: (int > (int > int)); 
: (real — (real — real)); 
: Vawith (mult:(a — (a — «)), eq?:(a — (a > bool))) . (list(cx) — (list( x) — list(a))) 


EXPRESSIONS: 


mult 
TYPE: V(x) with (mult:(a—(a—))).(a—(a—x)); 


expon r 
TYPE: (int—real); 


mult (list r) (list r) 
TYPE: <constraint error> —> 
(rmult:(list(real)—(list(real)}—list(real))) ) 


is unsatisfiable. 


Conte: ex [ap=ie [apt] on] 





Figure 5.7: Type inference of three expressions. 
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J _*shows (read-only > 


File Edit View Options Structure Text Help 















eq?_*_miult: (list(eq?_*_ mult) | int); 


Figure 5.8: Representation of attribute typeGrammar 
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VI. CONCLUSIONS 


We have considered the problem of type inference in an extension to the type 
system ML called ML,. The type system ML, is a formalism which is more suitable 
for implementing future languages by virtue of its incorporation of global overloading. 
Yet this increased functionality introduces new problems in developing algorithms 
which make typability decidable. Without restrictions on the types of overloadings 
and the structure of constraint sets, typability in ML, is undecidable. Typability 
in ML, is Turing reducible to the problem of determining if a set of constraints 
is satisfiable with respect to a given set of assumptions. If assumption sets are 
restricted to parametric overloadings the problem of constraint set satisfiability is 
NP-complete. 

The type inference algorithm W, with parametric overloading has been imple- 
mented utilizing the formalism of attribute grammars with GrammaTech’s Synthe- 
sizer Generator. It performs type inference on expressions in an interactive envi- 
ronment. Type inference is performed incrementally so that the types of partial 
expressions can be inferred and efficiency of re-computation in the presence of up- 
dates is enhanced. Consequently, immediate feedback is provided to the user as 
expressions are entered and updated. 

Our implementation will be used to examine the practical bounds on the prob- 
lem of constraint-set satisfiability. It will also represent a significant tool for explor- 
ing the limits of bounded polymorphism, or overloading, in programming languages. 
Can we devise new forms of overloading which are more flexible than parametric 


overloading yet retain a decidable satisfiability problem? 





A. FUTURE WORK 


This thesis will serve as a basis for further research aimed at ultimately de- 
veloping a type discipline for a class of implicitly-typed imperative programming 
languages with subtypes, overloading and polymorphism. A more immediate goal 
is to merge our implementation of W, with an SSL implementation that performs 
on-line type inference utilizing the type inference algorithm W of ML . 

On-line type inference allows the introduction of new global definitions as a 
program is produced. This differs from our batch implementation, where we have 
assumptions about types of free ids available to each expression in the form of an 
assumption set. The incorporation of overloading in an on-line implementation will 
be the subject of the next step in this research effort. This will produce an interactive 
environment where global definitions, perhaps overloaded, may be introduced at any 
point in the program. Types of all dependent terms are then recomputed as a result 


of these new definitions. 


40 


LIST OF REFERENCES 


ASU86] Aho, A., Sethi, R. and Ullman, J.: Compilers: Principles, Techniques, and 
Tools, Addison-Wesley Publishing Company, 1986. 


[(Car87 Cardelli, L.: Basic polymorphic typechecking, Science of Computer Pro- 
gramming, 8, pp. 147-172, 1987. 


[CHO92] Chen, K., Hudak, P. and Odersky, M.: Parametric Type Classes, Proc. 7th 
ACM Conf. on Lisp and Functional Programming, pp. 170-181, 1992. 





CDO91] Cormack, G. Duggan, D. and Ophel, J.: Decidable Type Reconstruction 
with Recursive Overloading (Extended Abstract), Department of Computer 
Science, University of Waterloo, 1991. 


[DaM82] Damas, L. and Milner, R.: Principal Type Schemes for Functional Pro- 
grams, Proc. 9th ACM Symposium on Principles of Programming Lan- 
guages, pp. 207-212, 1982. 


GeS84] Gecseg, F. and Steinby M.: Tree Automata, Akademiai Kiado, Budapest 
Hungary, 1984. 


[Gun92] Gunter, C.: Semantics of Programming Languages, Structures and Tech- 
niques, The MIT Press, 1992. 


[Has89] Report on the Programming Language Haskell, Version 1.0, April 1990. 





[HMM86] Harper, R., MacQueen, D. and Milner, R.: Standard ML. Technical Re- 
port ECS-LFCS-86-2, Department of Computer Science, University of Ed- 
inburgh, 1986. 


[Jon92] Jones, M.: A theory of qualified types, Proc. 4th European Symposium on 
Programming, LNCS 582, Springer-Verlag, pp. 287-306, 1992. 


[Kae88] Kaes, S.: Parametric Overloading in Polymorphic Programming Lan- 
guages, Proc. 2nd European Symposium on Programming, LNCS 300, 
Springer-Verlag, pp. 131-144, 1988. 


[Kae92| Kaes, S.: Type Inference in the Presence of Overloading, Subtyping, and 
Recursive Types, Proc. 7th ACM Conf. on Lisp and Functional Program- 
ming, pp. 193-204, 1992. 


41 


[Kas80| 


[Kni89] 


[KT 90] 


Lei83] 


McC84] 


Mil78] 





NiP93] 


Rey70] 


[Rob65] 





Smi91] 


[Smi93] 


[Tiu90] 


[Tho91| 


[Tur86] 


Kastens, U.: Ordered attribute grammars, Acta Inf., 13,3: pp. 229-256, 
1980. 


Knight, K.: Unification: A multidisciplinary survey. ACM Computing Sur- 
veys, 21(1): pp. 93-124, March 1989. 


Kfoury, A. and Tiuryn, J., Type reconstruction in finite-rank fragments of 
the polymorphic A-calculus. Fifth IEEE Symposium on Logic in Computer 
Science, pp. 2-11, 1990. 


Leivant, D.: Polymorphic Type Inference, 10th ACM Symposium on Prin- 
ciples of Programming Languages, pp. 88-98, Austin, Texas, 1983. 


McCracken, N.: The Typechecking of Programs with Implicit Type Struc- 
ture, Semantics of Data Types LNCS, 173, pp. 301-315, 1984. 


Milner, R.: A Theory of Type Polymorphism in Programming, J. of Com- 
puter and System Sciences, 17, pp. 348-375, 1978. 


Nipkow, T. and Prehofer, C.: Type Checking Type Classes, Proc. 20th 
ACM Symposium on Principles of Programming Languages, pp. 409-418, 
1993. 


Reynolds, J.C.: Transformational Systems and the Algebraic Structure of 
Atomic Formulas, Machine Intelligence, 5, pp. 135-151, 1970. 


Robinson, J.A.: A machine-oriented logic based on the resolution principle, 
Journal of ACM, 12:1, pp. 23-41, 1965. 


Smith, G.S.: Polymorphic Type Inference for Languages with Overloading 
and Subtyping, Ph.D. Thesis, Department of Computer Science, Cornell 
University, Technical Report 91-1230, 1991. 


Smith, G.S.: Polymorphic Type Inference with Overloading and Subtyping, 
Proc. TAPSOFT ’93, LNCS 668, Springer-Verlag, pp. 671-685, 1993. 


Tiuryn, J.: Type Inference Problems: A Survey, Proc. Mathematical Foun- 
dations of Computer Science, LNCS 452, Springer-Verlag, pp. 105-120, 
1990. 


Thompson, S.: Type Theory and Functional Programming, Addison- 
Wesley Publishing Company, 1991. 


Turner, D.: An Overview of Miranda, ACM SIGPLAN Notices,21 pp. 
156-166, 1986. 


42 


[VoS91] 


[Vol94a 


Vol94b 


Vol93a 








Vol93b 





WaB89] 


Volpano, D.M. and Smith, G.S.: On the Complexity of ML Typability with 
Overloading, Proc. 5th Conf. on Functional Programming Languages and 
Computer Architecture, LNCS 523, Springer-Verlag, pp. 15-28, 1991. 


Volpano, D.M.: Haskell-Style Overloading is NP-hard, Proc. 5th [EEE Intl 
Conf. on Computer Languages, Toulouse, France, to appear, May 1994. 


Volpano, D.M.: Parametric Overloading and the Computational Complex- 
ity of Satisfiability, submitted for publication. 


Volpano, D.M.: Basic Polymorphic Type Checking with Overloading, un- 
published manuscript. 


Volpano, D.M.: A Critique of Type Systems for Global Overloading, Naval 
Postgraduate School Technical Report NPSCS-94-002. 


Wadler, P. and Blott, $.: How to make ad-hoe polymorphism less ad-hoc, 
Proc. 16th ACM Symposium on Principles of Programming Languages, pp. 
60-76, 1989. 


43 


INITIAL DISTRIBUTION LIST 


Defense Technical Information Center 
Cameron Station 
Alexandria, VA 22304-6145 


Dudley Knox Library 
Code 52 

Naval Postgraduate School 
Monterey, CA 93943-5101 


Chairman, Computer Science Department 
Code CS 

Naval Postgraduate School 

Monterey, CA 93943 


Dr. Dennis M. Volpano 
Code CS/Vo 

Naval Postgraduate School 
Monterey, CA 93943 


Dr. Craig W. Rasmussen 
Code MA/Ra 

Naval Postgraduate School 
Monterey, CA 93943 


Bruce J. Bull 
1106 Spruance Road 
Monterey, CA 93940 


44 


10 


DUDLEY KNOX LIBRARY 
NAVAL POSTGRADUATE SCHOO! 
MONTEREY CA 93943-5101 





DUDLEY KNOX LIBRAI 


HAAN 


2768 00018820 5 








